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(54) METHOD AND DEVICE FOR DETECTING layout or the like) and a means discriminating kinds of 

CHARACTERISTIC SCENE FOR DYNAMIC IMAGE video images based on the - appearing order and 

combination is provided. 

(57) Abstract: 

COPYRIGHT: (C)1997,JPO 

PROBLEM TO BE SOLVED: To obtain information 
assisting the user for video image selection by 
discriminating whether or not a scene is an important 
scene in a video image, specifying its range simply at a 
high speed, discriminating and classifying to which 
field a video image belongs (news, sport relay or the 
like). 

SOLUTION: This device is provided with a means 
entering an object dynamic image to a processer in time 
series in the unit of frames, a means buffering plural 
frames entered in the past in the processing unit, and a 
means discriminating whether or not a feature variable 
of the buffered frame has a characteristic in which the 
feature variable approaches that of the newest frame 
monotonously in the order from the older frames. 
Furthermore, when it is discriminated true by the 
discrimination means, a means is provided to extract a 
video period till a succeeding special video effect is 
detected or a prescribed after the frame as an important 
scene. Moreover, in addition to the special video 
effect, a means detecting a change and a state of 
various video images (change of cut, display of caption, 
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(57) [Abstract] 

[Object] To perform judgement whether or not a scene is an important 
scene in a video image, and specifying its range simply at a high speed. 
Besides, to judge to which field (news, sport relay, .or the like) a 
video image belongs and classify it to provide information assisting 
the user to select the video image. 

[Constitution] There are provided means for inputting a dynamic image 
as an object to a processing unit in time series in frame units, means 
for buffering plural frames inputted to the processing unit in the 
past, and means for judging whether or not a feature variable of the 
buffered frame has a characteristic in which the feature variable 
approaches that of the newest frame monotonously in the order from 
the older frame?. Furthermore, when judgement made by ,the judgement 
means is true, it is judged that there is a special video effect, and 
there is provided means for extracting, as an important scene, a video 
period till a definite time elapses after the frame or till a succeeding 
special video effect is detected. Moreover, means for detecting a 
change and a state of various video images (change of cut, display 
of caption, layout or the like) in addition to the special video. effect 
is also provided, and means for discriminating kinds of video images 
based on the appearing order and combination of those is provided. 
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[0011] FIG. 1 is an example of a schematic block diagram of a system 
structure for realizing the present invention. Reference numeral 1 
designates a display device such as a CRT, which displays an output 
screen of a computer 4. Instructions to the computer 4 can be made 
by using an input device 5 such as a keyboard or a pointing device. 
A dynamic image reproduction device 10 is a tuner device for receiving 
broadcast programs of ground wave broadcasting, satellite 
broadcasting, cable television or the like, or a device for reproducing 
dynamic images recorded on an optical disk, a video tape or the like. 
A video signal outputted from the dynamic image reproduction device 
is sequentially converted into digital image data by an A/D converter 
3 and is sent to the computer. In the inside of the computer, the digital 
image data is inputted »to a memory 9 through an interface 8, and is 
processed by a CPU 7 in accordance with a program stored in the memory 
9. In the case where a number (frame number) is allocated in sequence 
from the head of the dynamic image to each frame of the dynamic image 
processed by the device 10, when the frame number is sent to the dynamic 
image reproduction device through a control line 2, the dynamic image 
of the scene can be called and reproduced. Besides, according to the 
necessity of processing, various information can be stored in an 
external information storage device 6. Various data prepared by 
processes explained below are stored in the memory 9, and are consulted 
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as the need arises. 

[0012] Hereinafter, a method of detecting dissolve as one of a cut 
change by a special video effect at the selection of an important scene, 
will be described in detail. 

[0013] FIG . 2 shows an example of a flowchart of a dissolve detection 
program of dynamic images ex&cuted on the system shown in FIG. 1. The. 
program is stored in the memory 9, and the CPU 7 sets various variables 
necessary for the execution of the program to initial values as an 
initializing process (200) . Next, 0 is substituted in the respective 
elements of m two-dimensional arrays B(x, y) containing brightness 
values of respective pixels of a past frame image (202) . When the size 
of the frame image is w x h, x takes a value from 0 to w-1, and y takes 
a value from 0 to h-1. In a process 204, a frame image outputted from 
the dynamic image reproduction device 10 is taken in (204). In a process 
206, a variable eval in which, an evaluation value is put is made 0, . 
and an initial value 0 is substituted in a loop counter. Then, following 
processes 208 to 228 are carried out for all pixels in the frame image. 
[0014] In the processes 208 to 228, detection of properties peculiar 
to the dissolve is carried out. Here, the dissolve is a cut change . 
having a period in which as shown in FIG. 3, frame images A and C of 
cuts are mixed before and after the change of -cuts, like B. A mixture 
ratio of A and C in B is inverted with a time from a state where A 
is 100% and C is 0 % at the time of start of the dissolve, and the 
dissolve is completed at a point of time when A finally becomes 0% 
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and C becomes 100%. In the case of a light and shade image, when a 
brightness value of A is Ba, a brightness value of B is Bb, a brightness 
value of C is Be, and a mixture ratio of C is a ( 0 < a < 1 ) , approximation 
can be made by an expression Bb = Ba x (1 - a) + Be x a. When this 
expression is modified, Bb = (Be - Ba) x a + Ba is obtained. In the 
case of the dissolve where the mixture ratio a is monotonously 
increased from 0, the value of Bb is also monotonously increased or 
decreased from Ba to Be. Accordingly, if brightness values of pixels 
are always stored in a buffer for the past m frames, and it is checked 
whether the brightness value is monotonously increased or decreased 
in the period of the m-frame length, detection of the dissolve can 
be performed. When the value of m is set to about 8 to 15, excellent 
results are obtained experimentally. 

[0015] First, in a process 2 08 , a brightness value of a pixel expressed 
by a coordinate (x, y) is substituted in an mth array Bm of two- 
dimensional arrays B .storing brightness values of the past frames. 
Then, 1 is substituted in a loop counter i, and 0 is substituted in 
a variable num. Next, a brightness value Bl (x, y) stored in the first 
array is compared with a value of the mth array Bm(x, y) (212), and 
subsequently, it is compared whether or not a brightness value Bi (x, 
y) stored in an ith array is larger than a value of a next array Bi+1 (x, 
y) (214, 216) . When Bl (x, y) is larger than Bm(x, y) , in the case where 
Bi (x, y) is larger than Bi+1 (x, y) , the value of num is increased by 
1. On the contrary, when Bl (x, y) is smaller than Bm(x, y) , in the 
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case where Bi (x, y) is smaller than Bi+1 (x, y) , the value of num is 
increased by 1 (218) . In a subsequent process 220, the value of Bi + 1 (x, 
y) is substituted in Bi (x, y) , so that the m arrays B are shifted one 
by one in sequence, and the brightness values of m frames from the 
newest frame are always stored as a buffer. In a process 222, the loop 
counter i is increased by 1, and until i becomes larger than m, the 
processes are repeated in such a manner that when Bl (x, y) is larger 
than Bm(x, y) at the point of time of the process 212, the procedure 
returns to the process 214 , if not, it returns to the process 2 1 6 (224). 
When the variable num is larger than a threshold value thl (226) , it 
is judged that a pixel of a coordinate (x, y) is sufficiently 
monotonously increased or decreased, and the value of the eval is 
increased by 1 (228) . It is natural that natural dynamic images have 
an irregular variation, and the speed of the dissolve also becomes 
inconstant by the occurrence of unevenness in the case where a person 
performs the dissolve operation. Thus, a margin is given by providing 
the threshold value for judgement of monotony. The procedure returns 
to 208 and is repeated so that the above processes are performed for 
all pixels in the frame images (230 to 236) . By this, the number of 
pixels satisfying the feature of the dissolve is put in the variable 
eval. Finally, it is checked whether or not the variable eval exceeds 
a threshold th2 (238) , and if it exceeds, it is judged that the dissolve 
exists, and a dissolve detection process (240) is executed. Finally, 
the procedure returns to the process 204, and the processes from 204 
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are repeated to the end of the video image. 

[0016] In the above method, even in the case where there is a movement 
of a camera, such as zoom or pan, the variable eval . appears rather 
high. This is because if the camera moves, the brightness of each pixel 
in the frame image is also changed in response to that, and in such 
change, there are many pixels in which the brightness is monotonously 
increased or decreased. Thus, there is also a case where it is hard 
to differentiate between the dissolve and the movement of. the camera. 
Then, in the following, such a dissolve detection method that the 
dissolve can be more clearly distinguished will be described. 
[0017] In general, a time of the dissolve usually becomes 1 second 
(30 frames in the case of the NTSC system) or more. Accordingly, in 
a period when the dissolve is made, in a time of 22 frames when m - 
8, or 15 frames even when m = 15, a state where the value of the variable 
eval is high continues. On the other hand, in the case, of the movement 
of the camera,' the value does not become high as in the dissolve, and 
a high state does not necessarily continuously continue. Accordingly, 
when the total sum of the values of the variable eval for the past 
n frames is taken, there appears a remarkable difference between the 
value of the sum in the dissolve and the sum in the movement of the 
camera. FIG. 4 shows a dissolve detection method in which the above 
approach is added. 

[0018] First, as an initialization process, various variables 
necessary for the execution of a program are set to initial values 
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(400) . Next, 0 is substituted in respective elements of m two- 
dimensional arrays B (x, y) containing brightness values of respective 
pixels of past frame images, and all n variables El to En storing values 
of the variable eval of past n frames are made 0 (402) . When the size 
of the frame image is w x h, x takes a value from 0 to w-1, and y takes 
a value from 0 to h-1. In a process 404, the frame image outputted 
from the dynamic image reproduction device 10 is taken in (404) . 
Hereinafter, processes 206 to 236 shown in FIG. 2 are executed to obtain 
the variable eval (406) . Then, the value of the variable eval is 
substituted in En. The total of El to En is obtained in sum, and a 
shift is made while the value of Ej + 1 is substituted in Ej in sequence, 
so that the newest eval value is always' stored in El to En (408 to 
412) . Finally, it is judged whether the sum is larger than a threshold 
th3 (414), and if larger, the dissolve detection process 240 is 
performed, and if not,, the procedure returns to the process 404,without 
performing anything, 'and is repeated. 

[0019] In the dissolve detection process 240, a scene interposed 
between dissolves is selected as an important scene. When the dissolve 
detection methods of FIGS. 2 and 4 are executed, it is possible to 
obtain a graph expressing a time transition of an evaluation value 
like FIG. 5. The evaluation value does not instantaneously show a large 
value in the dissolve period, but has a feature showing a triangular 
change in which it is rapidly increased and is rapidly decreased. Then, 
two apexes constituting a bottom side of a triangle substantially 
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correspond to a start point and an end point of the dissolve. When 
a digest is prepared, if a portion where a special video effect is 
made, such as the dissolve, remains at the head or end, it is unsightly. 
Thus, a period 507 from a point where the dissolve is ended to a point 
before a next dissolve is started is cut out. For that purpose, in 
addition to a first threshald value 500 used for judgement of the 
dissolve in the above dissolve detection method, a second threshold 
value 502 lower than that is used. In the case where the .dissolve as 
the start point of an important scene is detected, a point 506 when 
the evaluation value becomes first lower than the second threshold 
value after a point 504 when it exceeds the first threshold value, 
is made the start point of the important scene. At this time, the start 
point may be delayed for a margin. In the case where the dissolve is 
detected as the end point of the important scene, when the evaluation 
value is seen in the past fro;n a point 510 when it exceeds the first, 
threshold value, a point 508 when the evaluation value becomes first 
lower than the second threshold value is made the end point of the 
important scene. At this time, similarly to the start point, the end 
point may be made an early time for a margin. As the judgement whether 
the detected dissolve indicates the start point of the important scene 
or the end point, the time between dissolves- can be used. If normal 
broadcasting continues, since there is no dissolve, a time interval 
between dissolves becomes long, and in the important scene, the 
interval is relatively short. By reproducing the thus obtained 
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important scenes in sequence, a digest is obtained. 
[0020] In the above embodiment, although the monotonous change of 
brightness is checked, it is also possible to use a similar change 
of color. Differently from the brightness as one dimensional 
information, the color is three-dimension information. Accordingly, 
it is impossible to check a monotonous change simply on the basis of 
increase or decrease of values. Here, a simple change from color A 
to color B can be grasped as a tendency in which a distance from the 
color A is gradually increased, and a distance from the color B is 
gradually shortened when the two colors are mapped in a three- 
dimensional color space. Accordingly, instead of the two-dimensional 
array B storing the brightness values of the past frames in FIG. 2, 
a two-dimensional array B 1 storing colors is used, and if it is judged 
that the respective colors in B' are arranged in the form that a color 
difference from B'l is increased and at the same time, a color 
difference from B'm is decreased, a method similar to the case of the 
brightness can be used thereafter. 
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4) 0 £(T, ia2"e7^Lfc2 0 6*>P>2 3 6*-rOT*Q.a 
Sr*fTLTevaliSr#5 (4 0 6) . tLf, EnKeval© 

(C*ff Wevalffi7)SEl ~En(r*&^ $ jvc V n 5 ± 5 

(40 8~4 12) „ Mcm~, sum#Hfi£th3.fc K> 4>*:# 
v^»^5*»*r*ll*L (4 14) , A-#lt*U2\ f'/^ 
tW»12 4 0Srfrl\ -5r5T?^(mii<6It-&-rJ-«!!S 
4 0 4*T?MoT*ft9 ig-fo 

[0019] r/^'/itiiM 240 T-f±, *rV/vrfX 
IS4W7 !? y>'uy^til^-i£&llfT-r5i:, B5«J:5ftS 

m&&, f'y^iomitfi&bt&T&iaz&iQiSisX^ 

Vfr-fW*. *#iffl*T?©KIR! 5 0 7 £<ffl 9 i 5 

0©v^&2©H^5 0 2SrffiVN5. -tLt, SSiiBn© 

1 ©BUB 5 0 4 £AP$:m Clbxm 2 ©E8 
fiSfcTSoifc^S 0 6Srag4§ffi©M*&j£i:'*-5. r© 

\at, mmm*mi <Dmm$:mx.-fzj&5 1 o^e>ii*^ 

aoTJlfci:#WH*T»2©n«*Tiaofc;«[5 0 8 

#5„ ii^©jfti£rtSj^V>T;Jxtf, 7 f ^/W7*ttJ&VW-e7 f 

✓/U7*nil«OB*IBIRIIlll^ft<*0, SS«Batt&ff. tttfc 



(5) ¥fffl¥-9-6 5 2 8 7 

8 

[0 0 2 0] ±15©£lfc0lJfc*JV''Ttt:, ***©l£p&^ 
. _ ftSrli^fc^, 6©.IB]««)»k«rfiJffli-Sii:t>-t?* 

SrP^sr. tf*T-#&v\, Afea>ibBfe— ©^ 

fci*. Afta»&oK(t*:«t*|ci||Uoo, Bfe£©8Hllf 

tfSoT, H2^43^5iS*ro7U— ^.©m^irsriaig-f 
10 3— #C5ci5?iJB ©#*>!?{;:, fi,Srffiigi-S~J>:5cE5iJ 
B' SrfflV\ ^:©B* 4 , ©#£,3SB' 1 tWfeH^JfJn 

[002 1] ±15© <k 5 tUT* )/ fr?i$<D¥fW&tim¥frk 

+»lt«v»(Mfclc4o-CV^*ttf, *JB±KI«±fcv\ L- 
[0 0 2 2] @6i@7li, iix*:ti=-~—x&m-b* 

i) ma, 2) 3) 4) ^m, 5) 

7*, 6) 7) ^n-S^, ©7JSB?r«SJ(c* 

if/to r. 5 Lfc-Y'O hoaix^n^A-j^artttsa 
is^[amn5©t?, pc^ia©si^, ifljMfctfjunif 3 

^1tmi5l5.ixS#m* 5 &5o *fc, -t©i:#©aS#tt 
40 n— A*-e*>$#fra**v*i:rt\ SiB^t LT^ffjJi 

ictt, fentt^wfe-cfcsiitiP^w^t**. . 

5. S&ir, CM©«-&lr«, *©3*§JtL^/£V>, B 
; E^©^FrafttMv^, *if©1*«3»«fc«. r©J:5K P «* 

so ®4 , (ciocts : ias:©'r^v wmz-G- >t* 
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[0 0 2 3] 0 8 IS, ?*&©«*£: Jl#tt£v';*-7 L A© 
^ns/^Hw-^TfcSo A*JlSMfcl4, H«Mt*fc*J» 
ft&OZtl^tUZ^^X, m&&*) ii^gi58 0 O&tflf 

fifty"- f 14. -<'<^h«ltUgi58 0 4^e>Jx 1 8 04 
*©««S'Ji^tte>*i-;fcl|:ffl1&tHgl5 8 0 6~8 2 0(r«t 

>H4, Ym**?>9U8 2 2|Ciot, -Y'O 

S58 2 4«:, mk<r>-f^<> h&mmz, i>v<ii&m;<D 

[0 0 2 4] Jjcir, BSto^P y^irov>-cP3ffll= 

100 2 5] MfcttigB8 0 4»9*>, 

ttJgB8 0 6{4, #!/ heD*fc>0 B£tfcltiiN5. ^©^ffic 

No. 4, r*7-irriWl:tiit5g!!lsil 
3IWttifc*«MHSWIiifcj Wffl¥4-i 1.H81W 

8 2 2T-J4, *y hv^W»dS*!>Vh$^5 0 
[00 2 6] |5)— flSEU&WgB 8 0 6 14, -^«>j£«>fclSIBI 

icjbswjisoioiofcov^-c, mmmhL<t±&&& 

##T£iIiffi#©*gfn$:i 9, £*i«:Hftin«>ttft&£: 

MflHfcalBIEltta'iKv ^f!iili&©#1K£#if:-r 5 1 SSScT? 

[00 2 7] feP&WgP8 1 014, ^fcJgAfcNFlWCArt 
©ig*«cjfo-C, ©fefEt>L<l4ISML7ifegi©if& 
!4*3UxTV^S*>if5*»«r»ai-5. -*U-«4. 01x14*. 

£ftttfl!Bll£«Mfl(&, ^©feri'^HfcltlstfraxTV^ 
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fc, Sts-W-JMsofiSUKA^v^trfflatU #*©'fe 

[0 0 2 8] ^^mSB8 1 2J4, V&l&'PlC^&Zltl 
X^SfrHofr&tiUH-t&o ^©^tel-o^T^, m*. 
tf, JMMfbfcJ:*.' #JS¥5-330507^-C^$tVfc^}£ 

20 Sott9lftas*9:'h£tL.5. 

[0 0 2 9] 7*vOu^tfcWgB 8 1 4J4, WHfe«t»©-7*;Art' 

8 2 2T-I4, f > f yU7 f ©Wm*ifl**i'>'h*ix*. 
[0 0 3 0] U:/WtftUiSB8 1 614, ^ft^ftfc^FW 
£Hl*l©ii*lcaoT. £<R— ©Wfctfifch/Cioasi**? 
5 *»«rlftffli- 5, rnttPI-«H*ttJ» 80 8t 

W#bfiJ;5, <&mW-7- 1 1 4 5 6 
40 5. 

[0 0 3 1] *n-fi:£&ttig&8 1 814, f?^© 

n -S4.©Sft«©4&^, Bi^Sf 9 ii^-SC 8 0 0 T-Tv^ 
-<X^ix5H«ll4, *<Wi:W«l4S«»*a«<i:V^5«P 
fS* s *>5 (l/2^p-T?2ft, l/4^n— T*4 

50 fcio-CBfffcftRKSrW^S- fLT, — ft^W»©« 
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Sr^Slc:i»Djl^-„ l/4^-&(r{4, /h$V>ffl[j!)S3[Hl 

«g^-c*#v Mta 5 ileitis .t 5 kis 9 „ iau n 

[0 0 3 2] |SJ— aS#tfcW!35 8 2 0T?li, ^ftj£ftfcB# 
ft. *t**«ltS:i:^S****-fkL'CV^«a»if5a» 

t^rtms. ^^vhsij^^^^gps 2 2-c 

[0 0 3 3] ^^fi^^V^ffPS 2 4«i, ±IBC0-f^< 

•So 

[00 34] tttfcSB 8 2 8 -CI*, B#ff 8 2 6 L, 

it, * y #114 £**x«h.g>«3uijK wsmtt* **■ 

[00 3 5] iK>«fc SfcL-O&bttfc'f'O' h8r, 06 
t L<BH70J:54, -#fci*IWWi ii-aSWBStT?. 
0 lCQ^-r^wf lilc-g^t5ri:)j5f#5 0 

S>5„ Sfc. W-W«J!:«*.TV^*v^«Sa©#»*s«rfc^ 
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""5'tLTtJ:V\ E l -C^L^-v^^^O/Jf^ 

[0 0 3 6] #»WH:PC/WS*rJ|l^-C*a?# 
Stt; TV. VTR LTfciSffl^-Cfc 
10 5„ 

[0 0 3 7] 

[0 0 3 8] JbKIt, * y HEflS^»feWSr*tf«* 

[Ell] *»H©»IS«fc*S-f5fc»©S''*7 , -A:/n 
30 [1212] xy/l'^'tO^lllSrtT^^'D^^Aro^O— ^--y 

[HI 3] xy^roSfc^Sr^-t-HI-CfoSo 

[04] T'VA'^rotfcliiSrfTS 5 1 oc/n^7i»0 

[H16] =a-^tffl©*sw4'i''<y h^-y— h-efc 
*. 

[0 7 ] y 'O' h*-* — ht?*> 

40 S„ 

[0 8] ftt0^tT9^fA©7*ay:5'@ffe 

1-r-f^K, 2 3-A/D£ft 

tSgS, 7-CPU, 8-g|jK>f 9-^ 
*y, 10-lIM4Si, 11-*— aX-H. 
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B^x. y)i>* fcB m (x, y)t T 0 fcftA 

(0<x<wO£y<h) .K ./ 202 



7 U- AH«<O]R0ii* >h_/" 204 



206 



B m - £«(x. y)<OES§^»tf ft . -\^ 208 



3- 



i*- 1 



num *— 0 



,210 




no 



num — num + 



,218 



[ B i (x,y)«-B H . 1 <x.y),*\> 
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cval — eval + 1 




r220 



230 



y<— y+ 1 



240 
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B^x, y)jJ* *>B m (x, y)i f 0 SrftA 

<0^x<w,0^y<h) yTL/ 402 



404 
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*U12 0 6~2 3 6 /-L/ 406 



En *-eval 
sum <— 0 
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sum *- sum + Ej 
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